It amazes me that keyword density is still touted as an important on-page metric. The argument in its favor goes something like this: “The more times a term appears in a document, the higher the keyword density for that term. The higher the keyword density, the more relevant a document becomes for that term.”
The calculation used to back up this claim
The keyword density of term i (KDi) = the frequency term i occurs in a document (TFi) divided by the total word count of the document (WCi). Thus, the keyword density of the word ‘optimization’ when repeated 4 times in a 100 word document would be:
4/100 = 0.04: giving a keyword density (KDi) of 4%.
Looking at it this way, keyword density is a simplistic on-page measure that expresses a term/document ratio. This ratio fails to take into account the keyword density of any other document that may be competing for the same search space. To overcome this obvious failing, you then compare your document to the top ranking documents for the target term and adapt your page to match or better the keyword density of that document set. The result: Spamglish!
How to screw up your pages, destroy your credibility, aggravate your visitors, and justify the necessity for the back button.
I think we have all seen documents that have undergone this process. In an attempt to achieve some magical keyword density, the document has become difficult to read and looks spammy. Ultimately, legibility and usability suffer along with credibility and conversions.
If search engines derived document relevancy based heavily on keyword density, then all you would have to do is repeat your target term over and over to get pages to rank. Search engines are not that dumb. Keyword density ratios fail to take into account the relative position (contextual relevancy) and relative dispersion (distribution) of terms in the document or how many documents are relevant for the term.
Furthermore, keyword density ignores internal linking, site structure, back links, how long users stay on the page, domain age, etc. Keyword density is a worthless metric; keyword density tools are a waste of time, and people who chase some mystical on page keyword density are probably doing more harm than good.
We’re just talking semantics
When writing content, a natural pattern of keywords, verbs, nouns, and synonyms evolves. This is where the true meaning of the page lies. For example, the term ‘Java’ could refer to coffee, a programming language, or a geographic location. It is the words that surround the term on a page that help define context. Simply counting the number of times Java appears in a document will never uncover this.
Lets assume that you have created a document about coffee and you have ensured that the keywords are in all the relevant places, page title, meta tags,
Semantic Search
Using Google’s semantic operator, you can discover words that Google deems semantically related to your target terms. Search for ‘coffee’ and Google will tell you it has indexed 313,000,000 pages that contain the word ‘coffee’. Now search ~coffee and that number rises to 655,000,00 pages, more than double. These extra pages contain words that Google sees as related. These words will be dotted around the results and highlighted in bold.
You can peel away the layers of semantically related terms by adapting your search to ignore some terms. Try ~coffee -coffee. This brings up a new set of results and the page count drops to 209,000,000. You have now excluded all the pages that contain the word ‘coffee’. Although the pages returned don’t contain the term, they are still deemed related.
Closer inspection of the results will show that the first highlighted term is now cafe, so we can again refine our search to ~coffee -coffee -cafe. The page count drops off to 2,170,000 and uncovers our next word, starbucks. Rinse and repeat.
~coffee -coffee -cafe –starbucks
~coffee –coffee -cafe -starbucks –caffeine
~coffee -coffee -cafe -starbucks -caffeine -koffee
Now test a second set starting with ‘cafe’ as the seed, which gives us:
~cafe –cafe
~cafe -cafe –coffee
~cafe -cafe -coffee -café
Next start with ‘starbucks’. This one leads to a dead end (no new words).
~starbucks –starbucks
~starbucks -starbucks -coffee
And finally ‘caffeine’:
~caffeine –caffeine
~caffeine -caffeine –coffee
~caffeine -caffeine -coffee -stimulants
So we have a word list that contains:·
- coffee
- cafe
- starbucks
- caffeine
- café
- stimulants
- stimulant
N.B. When a plural occurs I would also include the singular word version, as they both stem from the same root. I would include stimulating for the same reason.
I find it easier to write and research content when I work from a word list in this way. Some seed words can develop a sizable list that opens up a wider selection of resources to research. Documents are more complete and relevant without having to resort to keyword stuffing, plus you satisfy search engines and the reader.
Further Reading
http://www.knowledgesearch.org/lsi/cover_page.htm
http://www.miislita.com/fractals/keyword-density-optimization.html
Neither of these documents is new, but both remain as relevant today as the day they were written.